Pythonistas

Who is a Python user?

userRs

Who is a R user?

Who uses both?

R / Python

Can’t we all just get along?

Andrew B. Collier

PyData (Berlin)

8 July 2018

#python #rstats #rpy2 #reticulate @projectjupyter @rstudio
@datawookie

vs

Language rankings (Redmonk, Q1 2018).

Language ratings over time (Redmonk, Q1 2018).

is…

  • the grande dame (first release in 1995)
  • more than 10k packages on CRAN
  • myriad builtin data sets
  • vast statistical and numerical capabilities
  • excellent (but sometimes opaque) documentation and
  • standalone analyses

but

  • can be slow and a memory glutton
  • steep learning curve!

is…

  • the Data Science debutante (although first release in 1991)
  • explicitly object oriented
  • encourages structured code
  • general purpose and
  • integration and production systems

but

  • no native Data Science capabilities
  • module dependencies can be a problem!

How can we
leverage
the
best
features of
both
languages?

What about blending them together?

Don’t think twice about mixing in a bit of SQL.

# R
#
dbGetQuery(db, "SELECT * FROM customer;")
# Python
#
db.execute("SELECT * FROM employee;")

Here are some options:

  • \(\subset\) (R within Python)
  • \(\subset\) (Python within R) or
  • \(\{\) , \(\} \subset\) some other language.

feat.

Alternatives

  • PypeR
    • release 1.1.2 (2014)
  • pyRserve
    • uses Rserve (access R via RPC)
    • R process can run on separate machine
    • release 0.9.1 (2017)
  • rpy2
    • runs embedded R in a Python process
    • release 2.9.4 (2018)
  • Installs via pip.
pip install rpy2
docker run --rm -p 8888:8888 rpy2/jupyter

feat.

Alternatives

  • Install from CRAN or GitHub.
# - From CRAN
install.packages("reticulate")
# - From GitHub
devtools::install_github("rstudio/reticulate")

\(+\)

rpy2

  • run R code
  • Jupyter magic: %R and %%R
  • -i and -o for block magic
  • pandas2ri conversions

reticulate

  • run Python code
  • virtual environments
  • import() modules
  • repl_python()
  • py$ and r. special objects

Clearly
both
languages are
important.

A Data Scientist should be at least
conversant
in
both
of them.

Does it need to be an exclusive relationship?

We can
divide
our allegiance between
both
languages!

You will have the
best
of
both
worlds.